Goto

Collaborating Authors

 column show



Derivations

Neural Information Processing Systems

Lemma 1 (Ensemble Sample Diversity Decomposition) Given the state-action visit distribution of the ensemble policy ฯ. The entropy of this distribution is H(ฯ). By definition, I(ฯ;z) = H(ฯ) H(ฯ|z) = H(z) H(z|ฯ) (4) By randomly selecting the latent variable z, we consider that H(z) is a constant depending on the number of z. Lemma 3 Let X1,X2,...,XN be an infinite sequence of i.i.d. The PDF of XN:N can be derived by taking the derivative of PDF.